Search CORE

118 research outputs found

Geometry of Policy Improvement

Author: JN Tsitsiklis
M Hutter
N Ay
RS Sutton
S Kakade
SM Ross
Publication venue
Publication date: 06/04/2017
Field of study

We investigate the geometry of optimal memoryless time independent decision making in relation to the amount of information that the acting agent has about the state of the system. We show that the expected long term reward, discounted or per time step, is maximized by policies that randomize among at most

k

actions whenever at most

k

world states are consistent with the agent's observation. Moreover, we show that the expected reward per time step can be studied in terms of the expected discounted reward. Our main tool is a geometric version of the policy improvement lemma, which identifies a polyhedral cone of policy changes in which the state value function increases for all states.Comment: 8 page

arXiv.org e-Print Archive

Crossref

Two semi-Lagrangian fast methods for Hamilton-Jacobi-Bellman equations

Author: CY Kao
E Cristiani
H Zhao
J Qian
JA Sethian
JA Sethian
JN Tsitsiklis
M Falcone
S Cacace
W-K Jeong
Y Tsai
Z Fu
Publication venue
Publication date: 09/09/2013
Field of study

In this paper we apply the Fast Iterative Method (FIM) for solving general Hamilton-Jacobi-Bellman (HJB) equations and we compare the results with an accelerated version of the Fast Sweeping Method (FSM). We find that FIM can be indeed used to solve HJB equations with no relevant modifications with respect to the original algorithm proposed for the eikonal equation, and that it overcomes FSM in many cases. Observing the evolution of the active list of nodes for FIM, we recover another numerical validation of the arguments recently discussed in [Cacace et al., SISC 36 (2014), A570-A587] about the impossibility of creating local single-pass methods for HJB equations

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Roma 3

Archivio della ricerca- Università di Roma La Sapienza

Evolutionary game of coalition building under external pressure

Author: A Saichev
CH Papadimitriou
DO Pushkin
E Weese
H Inal
J Norris
JB Jouida
JM Lasry
JN Tsitsiklis
M Finus
MF Chen
MG Crandall
N Gast
PL Krapivsky
VN Kolokoltsov
VN Kolokoltsov
W Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

We study the fragmentation-coagulation (or merging and splitting) evolutionary control model as introduced recently by one of the authors, where

N

small players can form coalitions to resist to the pressure exerted by the principal. It is a Markov chain in continuous time and the players have a common reward to optimize. We study the behavior as

N

grows and show that the problem converges to a (one player) deterministic optimization problem in continuous time, in the infinite dimensional state space

arXiv.org e-Print Archive

Crossref

Warwick Research Archives Portal Repository

Archivio istituzionale della ricerca - Università di Padova

Pseudorehearsal in value function approximation

Author: A Robins
A Robins
B Baddeley
CJ Watkins
J Gama
JL McClelland
JN Tsitsiklis
KP Murphy
M Frean
M Hattori
M McCloskey
R Coop
R Ratcliff
RJ Williams
RM French
RS Sutton
S Adam
Publication venue
Publication date: 21/03/2017
Field of study

Catastrophic forgetting is of special importance in reinforcement learning, as the data distribution is generally non-stationary over time. We study and compare several pseudorehearsal approaches for Q-learning with function approximation in a pole balancing task. We have found that pseudorehearsal seems to assist learning even in such very simple problems, given proper initialization of the rehearsal parameters

arXiv.org e-Print Archive

Crossref

Exploring Graphs with Time Constraints by Unreliable Collections of Mobile Robots

Author: A Casteigts
A Corberán
DS Johnson
E Bampas
ED Demaine
EL Lawler
FV Fomin
GH Young
HA Eiselt
J Czyzowicz
JN Tsitsiklis
M Chrobak
MR Garey
MR Garey
N Christofides
R Baeza Yates
S Bock
Publication venue
Publication date: 02/10/2017
Field of study

A graph environment must be explored by a collection of mobile robots. Some of the robots, a priori unknown, may turn out to be unreliable. The graph is weighted and each node is assigned a deadline. The exploration is successful if each node of the graph is visited before its deadline by a reliable robot. The edge weight corresponds to the time needed by a robot to traverse the edge. Given the number of robots which may crash, is it possible to design an algorithm, which will always guarantee the exploration, independently of the choice of the subset of unreliable robots by the adversary? We find the optimal time, during which the graph may be explored. Our approach permits to find the maximal number of robots, which may turn out to be unreliable, and the graph is still guaranteed to be explored. We concentrate on line graphs and rings, for which we give positive results. We start with the case of the collections involving only reliable robots. We give algorithms finding optimal times needed for exploration when the robots are assigned to fixed initial positions as well as when such starting positions may be determined by the algorithm. We extend our consideration to the case when some number of robots may be unreliable. Our most surprising result is that solving the line exploration problem with robots at given positions, which may involve crash-faulty ones, is NP-hard. The same problem has polynomial solutions for a ring and for the case when the initial robots' positions on the line are arbitrary. The exploration problem is shown to be NP-hard for star graphs, even when the team consists of only two reliable robots

arXiv.org e-Print Archive

Crossref

Carleton University's Institutional Repository

HAL AMU

Social learning against data falsification in sensor networks

Author: A Vempaty
B Kailkhura
B Kailkhura
B Kailkhura
D Acemoglu
D Warren
E Shi
J Tsitsiklis
J Zhang
JD Papastavrou
JF Chamberland
JF Chamberland
JF Chamberland
JN Tsitsiklis
L Lamport
PF Swaszek
R Rajagopalan
R Viswanathan
R Viswanathan
S Marano
V Krishnamurthy
VSS Nadendla
VV Veeravalli
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/10/2017
Field of study

Sensor networks generate large amounts of geographically-distributed data. The conventional approach to exploit this data is to first gather it in a special node that then performs processing and inference. However, what happens if this node is destroyed, or even worst, if it is hijacked? To explore this problem, in this work we consider a smart attacker who can take control of critical nodes within the network and use them to inject false information. In order to face this critical security thread, we propose a novel scheme that enables data aggregation and decision-making over networks based on social learning, where the sensor nodes act resembling how agents make decisions in social networks. Our results suggest that social learning enables high network resilience, even when a significant portion of the nodes have been compromised by the attacker

arXiv.org e-Print Archive

Crossref

Spiral - Imperial College Digital Repository

From Bellman to Dijkstra: Set oriented construction of globally optimal controllers

Author: A Anta
B Lincoln
EW Dijkstra
JA Sethian
JA Sethian
JN Tsitsiklis
L Grüne
L Grüne
L Grüne
M Dellnitz
O Junge
R Bellman
W Tucker
WH Fleming
Publication venue
Publication date: 01/04/2020
Field of study

Crossref

EPub Bayreuth

Robust Markov Decision Processes

Author: Ash RB
Ben-Tal A
Bertsekas DP
Berç Rustem
Billingsley P
Daniel Kuhn
Garey MR
Rockafellar RT
Tsitsiklis JN
Wolfram Wiesemann
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date
Field of study

Crossref

A Semi-Lagrangian scheme for a modified version of the Hughes model for pedestrian flow

Author: A Alla
A Bensoussan
A Lachapelle
AS Sznitman
B Jourdain
B Piccoli
C Burstedde
D Amadori
D Amadori
D Bertsekas
D Helbing
E Carlini
E Carlini
E Cristiani
E Cristiani
E Cristiani
E Gobet
F Camilli
H Zhao
HP McKean
JA Carrillo
JA Sethian
JM Lasry
JN Tsitsiklis
JP Aubin
L Huang
LR Hughes
M Bossy
M Burger
M Falcone
M Francesco Di
M Huang
M Twarogowska
ML Puterman
MS Santos
N Bellomo
O Axelsson
P Degond
P Protter
PA Thompson
R Carmona
RM Colombo
RM Colombo
S Cacace
S Méléard
VJ Blue
WH Fleming
Publication venue
Publication date: 15/02/2016
Field of study

In this paper we present a Semi-Lagrangian scheme for a regularized version of the Hughes model for pedestrian flow. Hughes originally proposed a coupled nonlinear PDE system describing the evolution of a large pedestrian group trying to exit a domain as fast as possible. The original model corresponds to a system of a conservation law for the pedestrian density and an Eikonal equation to determine the weighted distance to the exit. We consider this model in presence of small diffusion and discuss the numerical analysis of the proposed Semi-Lagrangian scheme. Furthermore we illustrate the effect of small diffusion on the exit time with various numerical experiments

arXiv.org e-Print Archive

Crossref

HAL-UNILIM

Warwick Research Archives Portal Repository

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Archivio della ricerca- Università di Roma La Sapienza

Building collaboration in multi-agent systems using reinforcement learning

Author: A Colorni
A Kazemi
A Kouider
C Watkins
G Tesauro
H Iima
J Bradtke
J Kennedy
J Vazquez-Salceda
JN Tsitsiklis
L Bull
L Bull
L Panait
LM Hercog
M Gath
M Kolp
M Tasgetiren
MB Ayhan
ME Aydin
ME Aydin
ME Aydin
R Poli
RS Sutton
S Mohebbi
U Wilensky
X Dong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

© Springer Nature Switzerland AG 2018. This paper presents a proof-of concept study for demonstrating the viability of building collaboration among multiple agents through standard Q learning algorithm embedded in particle swarm optimisation. Collaboration is formulated to be achieved among the agents via competition, where the agents are expected to balance their action in such a way that none of them drifts away of the team and none intervene any fellow neighbours territory, either. Particles are devised with Q learning for self training to learn how to act as members of a swarm and how to produce collaborative/collective behaviours. The produced experimental results are supportive to the proposed idea suggesting that a substantive collaboration can be build via proposed learning algorithm

Crossref

UWE Bristol Research Repository